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AESTRACT 

A statistical method has been developed for nested 
incomplete samples in a longitudinal study in which part of the 
sample has dropped out'in such a way that the data have a nested 
pattern. A procedure which performed well in a Monte Carlo experiment 
was extended to a two-factor incomplete design with repeated measures 
on one factor. Methods designed for this type of analysis included 
two multivariate techniques--likelihood ratio and step-down 
procedures with between subjects and within subjects variables--and 
univariate methods involving use of the EM algorithm in conjunction 
with restricted maximum likelihood estimation of variance components. 
Nested data were subjected to analysis of variance and step-down 
tests. It was concluded that step-down statistics were simple to use 
and did not require restrictive assumptions--type H covariance 
matrix--of univariate analysis of variance. They were appropriate for 
a mixed model, and calculation software was available. Step-down 
Statistics were preferable to analysis of variance when the sample 
was nested and assumption violations were likely. References and four 
data tables conclude the document. (GDC) 
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nt uction 


In the behavioral and health sciences» it is common to encoun- 
ter research where subjects are observed across an extended 


period of time. Treatment of longitudinal data is often 
approached as a univariate problem, using a repeated measures 
analysis of vsriai.ce. Because of assumptions required by this 


approach, some prefer to consider the analysis of longitudinal 
data a8 a multivariate problem. Either approach is relatively 
simple to deal with» using current computer software - unless 
data values are lost and the layout becomes incomplete. If miss- 
ing values occur in a non-random manner, they may bias results to 
a point where the experiment is invalidated. Even if subjects 
“drop out" at random (Rubin, 1976)» analysis of incomplete data 
is a difficult task. 


Halperin (1984), in a Monte Carlo experiment» compared several 
inferential procedures developed for a single nested incomplete 
sample. One procedure which performed well in the Monte Carlo 
study has been extended to a two-factor incomplete design with 
repeated measures on one factor. It is my objective to illus- 
trate the use of this inferential method which is appropriate to 
a class of incomplete layouts» including user-oriented explana- 
tions and computer software considerations. 


Inferential Methods 


Inferences from incomplete data are often sinplified when val- 
ves are missing in a pattern. Because of this» much of the lit- 
erature is concerned with special cases, as is well summarized by 
Little (1976). One special pattern occurs commonly in data col- 
lected over time. When subjects "drop out" and fail to return, 
by reordering the observations we can form a table which displays 
a triangular pattern. Such a pattern is termed "nested" or "mon- 
otone" in the literature, and represents an important spe al 
case. 


Two inferential methods studied by Halperin (1984) vere 
devised for nested data. One was a likelihood ratio statistic 
derived by Bhargava (1962,1975). A second, closely relatei to 
the likelihood ratio test» is one based on step-down procedures 
(Roy and Bargmann, 19583 Royr 1958). The latter has been 
extended to a design with one “Between Subjects" variable and one 
"Within Subjects" variable. 


Both the likelihood ratio and step-down tests are multivariate 
analyses. When performed on full datas the likelihood ratio sta- 
tistic equals the Wilk’s Lambda statistic used in multivariate 
analysis of variance. Univariate procedures for incomplete data 
have also been developed and are discussed next. 
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Yates (1933) and Bartlett (1937) suggested procedures to ana- 
lyze randomized blocks and split-plot designs when data values 
were missing at random. With current software, such analyses are 
easily performed, using general linear models prograns. It 
should be noted» however, that these procedures were developed 
for fixed effects models» where randomness is provided by the 
random assignment of treatments to experimental units. This is 
contrasted with the repeated measures design» where "treatments" 


are really trials over time,» and "blocks" are randomly selected 
subjects. For such data» a mixed,» rather than fixed,» model is 
required. Laird and Ware (1982) consider random and mixed 


designs appropriate to longitudinal data. They recommend use of 
the EM algorithm (Dempster, Laird» and Rubin» 1977) in conjunc- 
tion with restricted maximum likelihood estimation of variance 
components and provide an extremely flexible model for analysis 
of data collected over time. 


Practical Considerations 


In selecting an analysis to use on incomplete data» we must 
consider a series of "trade-offs." Given valid procedures, we 
should choose among them based on several considerations. The 
first deals with simplicity. It is not sufficient that the con- 
sulting statistician understand the method: the person who 
"owns" the data and is ultimately responsible for their analysis, 
must understand the statistics, at least on an intuitive level. 
Such is possible with step-down tests, but less so0 with the like- 
lihood ratio test of Bhargava» and still less so with use of the 
method of Laird and Ware (1982). Step-down procedures,» though 
far from trivial» are well documented in the multivariate litera- 
ture and are the most "“user-friendly" of the tests considered. 
They may not be the most powerful,» however. 


A second practical consideration is availability of computer 
software to calculate the statistics and their probabilities. 
Here again, step-down procedures are relatively ¢30d, while oth- 
ers suffer from a lack of computer software designed to perform 
the analysis. 


Next» various methods of analysis are illustrated, using cur- 
rently popular statistical packages. 


Analysis of Variance 


To illustrate the various inferential procedures, an example 
was taken from Cochran and Cox (1957, p.300). The Between Sub- 
jects variable, Conditions, has three levels» and the Within Sub- 
jects variable, Trials», has six levels. Values were deleted at 
random to form a nested pattern within each condition. Table 1 
contains these three nested samples, after observations are re- 
ordered to display the nested pattern. 


PAGE 3 


Let 
YCijk} =u + TLj} + mMLitj)} + Tk} + *TLjk} + Tel ki(j)} 
be the usual split-plot factorial analysis of variance model for 
Subject i in Condition j at Trial k» subject to the conventional 
restrictions that effects sum to zero. We test the hypotheses 
H(C):s ail rlj} = 0 
H(T)& all tlk} = 0 
H(CT): all rt{jk} = O. 

For each subject with no missing values,» we average to obtain 

YCij.} =u t+ TIj} + HCicj)} 

This can be recognized as a one-way completely randomized model. 
The analysis of variance on Y{ij.JNp will provide the Between 
Subjects portion of the split-plot source table. Scaling by the 


Square root of p yields results traditionally reported in a 
split-plot source table. These results are reported in Table 2. 


Hypotheses T and CT consider main effects for Trials and 


interactions of Conditions and Trials» respectively. These 
tests», reported in Table 2, result from fitting the split-plot 
model to all available data. I used PROC GLM on SAS (1982) to 


find the values in Table 2. 


Step-Down Tests 


There are several weaknesses inherent in the analysis of vari- 
ance reported in Table 2. First» the methods used to calculate 
the analysis presume a fixed effects design» and that is unreal- 
istic. Second, the analysis requires that the covariance matri- 
ces be homogeneous, and that the common covariance matrix be 
"Type H" (Huyhn and Feldt, 1970). If the latter condition is not 
true» the F-tests for T and CT become quite unstable. Although 
it might be anticipated that they would turn liberal, recent 
information (LaLonde, 1985) guggests that this may not always be 
the case with nested incomplete data. 


Alternative means of testing Hypotheses T and CT» without 
requiring assumptions with yield Type H covariance matrices, are 
available. Step-down tests, when applied to full data» provide 
test statistics "which are statistically independent but depend 
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upon an a priori ordering of criterion teasures" (Finns 1974, p. 
157). Further discussion may be found in Subbaiah and Mudholkar 
(1978). 


As they are with full data» it has been demonstrated that 
step-down statistics are statistically independent when applied 
to nested incomplete data. This fact stems from our ability to 
factor likelihood functions when the data form a nested pattern. 
Details of this have been observed by many» including Anderson 
(1957), Hocking and Marx (1979), Morrison (1976), and Marini,» 
Olson,» and Rubin (1980). 


Before testing hypotheses T and CT, ve transform the data as 
is done in profile analysis (e.g.» Morrison, 1976, p. 146). Dif- 
ferences 
Di ijk} 


YCijk} - Ylij,>k+1} 
Tk I-TEK+L}) + (FTL KIT Je ktLIA-) + 
(ent ki(j) tal k+l, ij) )) 


( 


are formed for k=1,...»p-l» or 5 in our example, and are reported 
in Table 3. Treating D{ ijk} as p-1 dependent variables in a mul- 
tivariate analysis» we can test that the (p-1) grand means, Ttik} 
-t{k+1}, are zero» and that the (p-1) main effects» rt{jk} - 
rt{j»k+1}, are zero. These are precisely H(T) and H(CT)» and can 
also be tested with step-down statistics. 


ee ee ee ee ee ee ee ee ee ee 


With full data,» step-down tests are calculated uaing a Chole- 
sky factoring» but with nested incomplete data, it is convenient 
to use a series of analyses of covgsriance. Differences Diijk} 
are treated as the dependent variables» with independent vari- 
ables of conditions and covariates of the previous trial differ- 
ences. If k=1, we use no covariate. This re-parameterization 
leads to the series of models: 


Di ijl} = 8{1} + n{ jl} + efijl} 
Diij2)} = 842) + n€j2)} + AC21ID(iji} + etij2} 
DCijS} = &(5) + ntjS) + a(C51.2343DCijil} + a{52.1343)Dtij2) + 
4(53.1243D(ij3} + 4{54.1233D(ij4} + ef ij5}. 
The t-test for each grand mean, S{j}> forms the set of 


"Trial" step-down statistics, while the F-statistics for the 
"Condition" parameters, n{jk}» are the step-down set for the 
interactions. These statistics, along with df and exceedarce 
probabilities, are reported in Table 4. All results were 
obtained by running a series of analyses of variance and covari- 
ance on PROC GLM of SAS. Care should be exercised not to inter- 
pret the constant in GLM as a grand mean. Each statistic is 
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tested at af{k}=.0102, so that the simultaneous error rate for 
each set is a =1-(l-ofk})5 = .05. Subbaiah and Mudholdar (1978) 
suggest more general ways of choosing a{k}. 


As can be seen» grand means for differences 2,3, and 4 differ 
significantly from zero» suggesting a trials effect. This was 
also found in the univariate analysis of Table 2. None of the CT 
effects were significantly different from zero, although the sta- 
tistic for the fourth difference was close. 


Summary 


Step-down statistics are simple to use when data are missing 
in a nested pattern. They do not require restrictive assumptions 
(Type H covariance matrix) of univariate analysis of variance, 
and are appropriate for a mixed model. The software currently 
available makes their calculation quite accessible. They should 
always be preferred to ANOVA when the sample is nested and 
assumption violations are likely. 
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Table 1 
Split-Plot Factorial Nested Layout 
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SUBJ TRIAL_1 TRIAL_2 TRIAU_3 TRIAL_4 TRIAL_5 TRIAL_6 AVERAGE 


WOONOUFWNY 


42 


WW 
Noe 


109.819 
106.145 
95.530 
74.301 
87.365 
64.503 


115.126 
114.310 
88.998 
81.241 
82.058 


Table 2 
Split-Plot Factorial ANOVA 


Source 


Between Subjects 
Conditions: C 0.22 .8043 
Subjects (C) 4503.717 346.440 


Within Subjects 
Trials: T 1331.024 266.205 12.81 .0001 
CT 225.434 22.543 1.08 .3783 
TxSubjects (C) 2742.847 20.779 
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Table 3 
Trial Differences 
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Table 4 
Step-Down Tests 


Trial Effects CT Effects 
Difference t df p F df p 
1 ve. 2 -2.35 42 .0235 0.34 242 -7138 
2 vs. 3 -3.02 31 -0050* 2.65 2°31 . 0868 
3 ve. 4 -2.92 23 -0077* 0.55 2,23 -5059 
4 vs. 5 -4.96 17 -0001% 5.65 2°17 20131 
5 vs. 6 -0.18 9 -8641 0.26 2,9 . 7740 


* 


Significant at ofk} = .0102 


19 


PAGE 10 


